A parallel pattern for iterative stencil + reduce
نویسندگان
چکیده
منابع مشابه
Parallel visual data restoration on multi-GPGPUs using stencil-reduce pattern
In this paper, a highly effective parallel filter for visual data restoration is presented. The filter is designed following a skeletal approach, using a newly proposed stencil-reduce, and has been implemented by way of the FastFlow parallel programming library. As a result of its high-level design, it is possible to run the filter seamlessly on a multicore machine, on multi-GPGPUs, or on both....
متن کاملOptimizing Data - Parallel Stencil
We have developed a communication optimizer that concentrates on stencil communication patterns. This optimizer has been done in the context of the UNH C* compiler that targets distributed-memory MIMD computers. Our work has two distinguishing features: The compiler/optimizer is designed to be highly portable. We achieve this goal by providing eecient support for the optimizations in the run-ti...
متن کاملStencil-Aware GPU Optimization of Iterative Solvers
Numerical solutions of nonlinear partial differential equations frequently rely on iterative Newton-Krylov methods, which linearize a finite-difference stencil-based discretization of a problem, producing a sparse matrix with regular structure. Knowledge of this structure can be used to exploit parallelism and locality of reference on modern cache-based multiand manycore architectures, achievin...
متن کاملDistributed Dynamic Load Balancing for Iterative-Stencil Applications
In the context of jobs executed on heterogeneous clusters or Grids, load balancing is essential. Indeed, a slow machine must receive less work than a faster one otherwise the overall job termination will be delayed. This is particularly true for Iterative-Stencil Applications where tasks are run simultaneously and are interdependent. The problem of assigning coexisting tasks to machines is call...
متن کاملEfficient multicore-aware parallelization strategies for iterative stencil computations
Stencil computations consume a major part of runtime in many scientific simulation codes. As prototypes for this class of algorithms we consider the iterative Jacobi and Gauss-Seidel smoothers and aim at highly efficient parallel implementations for cachebased multicore architectures. Temporal cache blocking is a known advanced optimization technique, which can reduce the pressure on the memory...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: The Journal of Supercomputing
سال: 2016
ISSN: 0920-8542,1573-0484
DOI: 10.1007/s11227-016-1871-z